System and method for restricting data transfers and managing software components of distributed computers

ABSTRACT

A controller, referred to as the “BMonitor”, is situated on a computer. The BMonitor includes a plurality of filters that identify where data can be sent to and/or received from, such as another node in a co-location facility or a client computer coupled to the computer via the Internet. The BMonitor further receives and implements requests from external sources regarding the management of software components executing on the computer, allowing such external sources to initiate, terminate, debug, etc. software components on the computer. Additionally, the BMonitor operates as a trusted third party mediating interaction among multiple external sources managing the computer.

RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.09/695,820, filed Oct. 24, 2000, entitled “System and Method forRestricting Data Transfers and Managing Software Components ofDistributed Computers”, which is hereby incorporated by referenceherein.

TECHNICAL FIELD

This invention relates to computer system management. More particularly,the invention relates to restricting data transfers and managingsoftware components of distributed computers.

BACKGROUND OF THE INVENTION

The Internet and its use have expanded greatly in recent years, and thisexpansion is expected to continue. One significant way in which theInternet is used is the World Wide Web (also referred to as the “web”),which is a collection of documents (referred to as “web pages”) thatusers can view or otherwise render and which typically include links toone or more other pages that the user can access. Many businesses andindividuals have created a presence on the web, typically consisting ofone or more web pages describing themselves, describing their productsor services, identifying other information of interest, allowing goodsor services to be purchased, etc.

Web pages are typically made available on the web via one or more webservers, a process referred to as “hosting” the web pages. Sometimesthese web pages are freely available to anyone that requests to viewthem (e.g., a company's advertisements) and other times access to theweb pages is restricted (e.g., a password may be necessary to access theweb pages). Given the large number of people that may be requesting toview the web pages (especially in light of the global accessibility tothe web), a large number of servers may be necessary to adequately hostthe web pages (e.g., the same web page can be hosted on multiple serversto increase the number of people that can access the web pageconcurrently). Additionally, because the web is geographicallydistributed and has non-uniformity of access, it is often desirable todistribute servers to diverse remote locations in order to minimizeaccess times for people in diverse locations of the world. Furthermore,people tend to view web pages around the clock (again, especially inlight of the global accessibility to the web), so servers hosting webpages should be kept functional 24 hours per day.

Managing a large number of servers, however, can be difficult. Areliable power supply is necessary to ensure the servers can run.Physical security is necessary to ensure that a thief or othermischievous person does not attempt to damage or steal the servers. Areliable Internet connection is required to ensure that the accessrequests will reach the servers. A proper operating environment (e.g.,temperature, humidity, etc.) is required to ensure that the serversoperate properly. Thus, “co-location facilities” have evolved whichassist companies in handling these difficulties.

A co-location facility refers to a complex that can house multipleservers. The co-location facility typically provides a reliable Internetconnection, a reliable power supply, and proper operating environment.The co-location facility also typically includes multiple secure areas(e.g., cages) into which different companies can situate their servers.The collection of servers that a particular company situates at theco-location facility is referred to as a “server cluster”, even thoughin fact there may only be a single server at any individual co-locationfacility. The particular company is then responsible for managing theoperation of the servers in their server cluster.

Such co-location facilities, however, also present problems. One problemis data security. Different companies (even competitors) can have serverclusters at the same co-location facility. Care is required, in suchcircumstances, to ensure that data received from the Internet (or sentby a server in the server cluster) that is intended for one company isnot routed to a server of another company situated at the co-locationfacility.

An additional problem is the management of the servers once they areplaced in the co-location facility. Currently, a system administratorfrom a company is able to contact a co-location facility administrator(typically by telephone) and ask him or her to reset a particular server(typically by pressing a hardware reset button on the server, orpowering off then powering on the server) in the event of a failure of(or other problem with) the server. This limited reset-only abilityprovides very little management functionality to the company.Alternatively, the system administrator from the company can physicallytravel to the co-location facility him/her-self and attend to the faultyserver. Unfortunately, a significant amount of time can be wasted by thesystem administrator in traveling to the co-location facility to attendto a server. Thus, it would be beneficial to have an improved way tomanage server computers at a co-location facility.

Additionally, the world is becoming populated with ever increasingnumbers of individual user computers in the form of personal computers(PCs), personal digital assistants (PDAs), pocket computers, palm-sizedcomputers, handheld computers, digital cellular phones, etc. Managementof the software on these user computers can be very laborious and timeconsuming and is particularly difficult for the often non-technicalusers of these machines. Often a system administrator or technician musteither travel to the remote location of the user's computer, or walkthrough management operations over a telephone. It would be furtherbeneficial to have an improved way to manage remote computers at theuser's location without user intervention.

The invention described below addresses these disadvantages, restrictingdata transfers and managing software components of distributedcomputers.

SUMMARY OF THE INVENTION

Restricting data transfers and managing software components in clustersof server computers located at a co-location facility is describedherein.

According to one aspect, a controller (referred to as the “BMonitor”) issituated on a computer (e.g., each node in a co-location facility). TheBMonitor includes a plurality of filters that identify where data can besent to and/or received from, such as another node in the co-locationfacility or a client computer coupled to the computer via the Internet.These filters can then be modified, during operation of the computer, byone or more management devices coupled to the computer.

According to another aspect, a controller referred to as the “BMonitor”(situated on a computer) manages software components executing on thatcomputer. Requests are received by the BMonitor from external sourcesand implemented by the BMonitor. Such requests can originate from amanagement console local to the computer or alternatively remote fromthe computer.

According to another aspect, a controller referred to as the “BMonitor”(situated on a computer) operates as a trusted third party mediatinginteraction among multiple management devices. The BMonitor maintainsmultiple ownership domains, each corresponding to a management device(s)and each having a particular set of rights that identify what types ofmanagement functions they can command the BMonitor to carry out. Onlyone ownership domain is the top-level domain at any particular time, andthe top-level domain has a more expanded set of rights than any of thelower-level domains. The top-level domain can create new ownershipdomains corresponding to other management device, and can also beremoved and the management rights of its corresponding management devicerevoked at any time by a management device corresponding to alower-level ownership domain. Each time a change of which ownershipdomain is the top-level ownership domain occurs, the computer's systemmemory can be erased so that no confidential information from oneownership domain is made available to devices corresponding to otherownership domains.

According to another aspect, the BMonitor is implemented in amore-privileged level than other software engines executing on the node,preventing other software engines from interfering with restrictionsimposed by the BMonitor.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example and notlimitation in the figures of the accompanying drawings. The same numbersare used throughout the figures to reference like components and/orfeatures.

FIG. 1 shows a client/server network system and environment such as maybe used with certain embodiments of the invention.

FIG. 2 shows a general example of a computer that can be used inaccordance with certain embodiments of the invention.

FIG. 3 is a block diagram illustrating an exemplary co-location facilityin more detail.

FIG. 4 is a block diagram illustrating an exemplary multi-tiered servercluster management architecture.

FIG. 5 is a block diagram illustrating an exemplary node of aco-location facility in more detail in accordance with certainembodiments of the invention.

FIG. 6 is a block diagram illustrating an exemplary set of ownershipdomains in accordance with certain embodiments of the invention.

FIG. 7 is a flow diagram illustrating the general operation of aBMonitor in accordance with certain embodiments of the invention.

FIG. 8 is a flowchart illustrating an exemplary process for handlingoutbound data requests in accordance with certain embodiments of theinvention.

FIG. 9 is a flowchart illustrating an exemplary process for handlinginbound data requests in accordance with certain embodiments of theinvention.

DETAILED DESCRIPTION

FIG. 1 shows a client/server network system and environment such as maybe used with certain embodiments of the invention. Generally, the systemincludes one or more (n) client computers 102, one or more (m)co-location facilities 104 each including multiple clusters of servercomputers (server clusters) 106, one or more management devices 110, andone or more separate (e.g., not included in a co-location facility)servers 112. The servers, clients, and management devices communicatewith each other over a data communications network 108. Thecommunications network in FIG. 1 comprises a public network 108 such asthe Internet. Other types of communications networks might also be used,in addition to or in place of the Internet, including local areanetworks (LANs), wide area networks (WANs), etc. Data communicationsnetwork 108 can be implemented in any of a variety of different manners,including wired and/or wireless communications media.

Communication over network 108 can be carried out using any of a widevariety of communications protocols. In one implementation, clientcomputers 102 and server computers in clusters 106 can communicate withone another using the Hypertext Transfer Protocol (HTTP), in which webpages are hosted by the server computers and written in a markuplanguage, such as the Hypertext Markup Language (HTML) or the eXtensibleMarkup Language (XML).

Management device 110 operates to manage software components of one ormore computing devices located at a location remote from device 110.This management may also include restricting data transfers into and/orout of the computing device being managed. In the illustrated example ofFIG. 1, management device 110 can remotely manage any one or more of: aclient(s) 102, a server cluster(s) 106, or a server(s) 112. Any of awide variety of computing devices can be remotely managed, includingpersonal computers (PCs), network PCs, multiprocessor systems,microprocessor-based or programmable consumer electronics,minicomputers, mainframe computers, gaming consoles, Internetappliances, personal digital assistants (PDAs), pocket computers,palm-sized computers, handheld computers, digital cellular phones, etc.Remote management of a computing device is accomplished by communicatingcommands to the device via network 108, as discussed in more detailbelow.

In the discussion herein, embodiments of the invention are described inthe general context of computer-executable instructions, such as programmodules, being executed by one or more conventional personal computers.Generally, program modules include routines, programs, objects,components, data structures, etc. that perform particular tasks orimplement particular abstract data types. Moreover, those skilled in theart will appreciate that various embodiments of the invention may bepracticed with other computer system configurations, including hand-helddevices, gaming consoles, Internet appliances, multiprocessor systems,microprocessor-based or programmable consumer electronics, network PCs,minicomputers, mainframe computers, and the like. In a distributedcomputer environment, program modules may be located in both local andremote memory storage devices.

Alternatively, embodiments of the invention can be implemented inhardware or a combination of hardware, software, and/or firmware. Forexample, all or part of the invention can be implemented in one or moreapplication specific integrated circuits (ASICs) or programmable logicdevices (PLDs).

FIG. 2 shows a general example of a computer 142 that can be used inaccordance with certain embodiments of the invention. Computer 142 isshown as an example of a computer that can perform the functions of aclient computer 102 of FIG. 1, a server computer or node in aco-location facility 104 of FIG. 1, a management device 110 of FIG. 1, aserver 112 of FIG. 1, or a local or remote management console asdiscussed in more detail below.

Computer 142 includes one or more processors or processing units 144, asystem memory 146, and a bus 148 that couples various system componentsincluding the system memory 146 to processors 144. The bus 148represents one

In the discussion herein, embodiments of the invention are described inthe general context of computer-executable instructions, such as programmodules, being executed by one or more conventional personal computers.Generally, program modules include routines, programs, objects,components, data structures, etc. that perform particular tasks orimplement particular abstract data types. Moreover, those skilled in theart will appreciate that various embodiments of the invention may bepracticed with other computer system configurations, including hand-helddevices, gaming consoles, Internet appliances, multiprocessor systems,microprocessor-based or programmable consumer electronics, network PCs,minicomputers, mainframe computers, and the like. In a distributedcomputer environment, program modules may be located in both local andremote memory storage devices.

Alternatively, embodiments of the invention can be implemented inhardware or a combination of hardware, software, and/or firmware. Forexample, all or part of the invention can be implemented in one or moreapplication specific integrated circuits (ASICs) or programmable logicdevices (PLDs).

FIG. 2 shows a general example of a computer 142 that can be used inaccordance with certain embodiments of the invention. Computer 142 isshown as an example of a computer that can perform the functions of aclient computer 102 of FIG. 1, a server computer or node in aco-location facility 104 of FIG. 1, a management device 110 of FIG. 1, aserver 112 of FIG. 1, or a local or remote management console asdiscussed in more detail below.

Computer 142 includes one or more processors or processing units 144, asystem memory 146, and a bus 148 that couples various system componentsincluding the system memory 146 to processors 144. The bus 148represents one or more of any of several types of bus structures,including a memory bus or memory controller, a peripheral bus, anaccelerated graphics port, and a processor or local bus using any of avariety of bus architectures. The system memory includes read onlymemory (ROM) 150 and random access memory (RAM) 152. A basicinput/output system (BIOS) 154, containing the basic routines that helpto transfer information between elements within computer 142, such asduring start-up, is stored in ROM 150.

Computer 142 further includes a hard disk drive 156 for reading from andwriting to a hard disk, not shown, connected to bus 148 via a hard diskdriver interface 157 (e.g., a SCSI, ATA, or other type of interface); amagnetic disk drive 158 for reading from and writing to a removablemagnetic disk 160, connected to bus 148 via a magnetic disk driveinterface 161; and an optical disk drive 162 for reading from or writingto a removable optical disk 164 such as a CD ROM, DVD, or other opticalmedia, connected to bus 148 via an optical drive interface 165. Thedrives and their associated computer-readable media provide nonvolatilestorage of computer readable instructions, data structures, programmodules and other data for computer 142. Although the exemplaryenvironment described herein employs a hard disk, a removable magneticdisk 160 and a removable optical disk 164, it should be appreciated bythose skilled in the art that other types of computer readable mediawhich can store data that is accessible by a computer, such as magneticcassettes, flash memory cards, digital video disks, random accessmemories (RAMs) read only memories (ROM), and the like, may also be usedin the exemplary operating environment.

A number of program modules may be stored on the hard disk, magneticdisk 160, optical disk 164, ROM 150, or RAM 152, including an operatingsystem 170, one or more application programs 172, other program modules174, and program data 176. A user may enter commands and informationinto computer 142 through input devices such as keyboard 178 andpointing device 180. Other input devices (not shown) may include amicrophone, joystick, game pad, satellite dish, scanner, or the like.These and other input devices are connected to the processing unit 144through an interface 168 that is coupled to the system bus. A monitor184 or other type of display device is also connected to the system bus148 via an interface, such as a video adapter 186. In addition to themonitor, personal computers typically include other peripheral outputdevices (not shown) such as speakers and printers.

Computer 142 optionally operates in a networked environment usinglogical connections to one or more remote computers, such as a remotecomputer 188. The remote computer 188 may be another personal computer,a server, a router, a network PC, a peer device or other common networknode, and typically includes many or all of the elements described aboverelative to computer 142, although only a memory storage device 190 hasbeen illustrated in FIG. 2. The logical connections depicted in FIG. 2include a local area network (LAN) 192 and a wide area network (WAN)194. Such networking environments are commonplace in offices,enterprise-wide computer networks, intranets, and the Internet. In thedescribed embodiment of the invention, remote computer 188 executes anInternet Web browser program (which may optionally be integrated intothe operating system 170) such as the “Internet Explorer” Web browsermanufactured and distributed by Microsoft Corporation of Redmond, Wash.

When used in a LAN networking environment, computer 142 is connected tothe local network 192 through a network interface or adapter 196. Whenused in a WAN networking environment, computer 142 typically includes amodem 198 or other component for establishing communications over thewide area network 194, such as the Internet. The modem 198, which may beinternal or external, is connected to the system bus 148 via aninterface (e.g., a serial port interface 168). In a networkedenvironment, program modules depicted relative to the personal computer142, or portions thereof, may be stored in the remote memory storagedevice. It is to be appreciated that the network connections shown areexemplary and other means of establishing a communications link betweenthe computers may be used.

Generally, the data processors of computer 142 are programmed by meansof instructions stored at different times in the variouscomputer-readable storage media of the computer. Programs and operatingsystems are typically distributed, for example, on floppy disks orCD-ROMs. From there, they are installed or loaded into the secondarymemory of a computer. At execution, they are loaded at least partiallyinto the computer's primary electronic memory. The invention describedherein includes these and other various types of computer-readablestorage media when such media contain instructions or programs forimplementing the steps described below in conjunction with amicroprocessor or other data processor. The invention also includes thecomputer itself when programmed according to the methods and techniquesdescribed below. Furthermore, certain sub-components of the computer maybe programmed to perform the functions and steps described below. Theinvention includes such sub-components when they are programmed asdescribed. In addition, the invention described herein includes datastructures, described below, as embodied on various types of memorymedia.

For purposes of illustration, programs and other executable programcomponents such as the operating system are illustrated herein asdiscrete blocks, although it is recognized that such programs andcomponents reside at various times in different storage components ofthe computer, and are executed by the data processor(s) of the computer.

FIG. 3 is a block diagram illustrating an exemplary co-location facilityin more detail. Co-location facility 104 is illustrated includingmultiple nodes (also referred to as server computers) 210. Co-locationfacility 104 can include any number of nodes 210, and can easily includean amount of nodes numbering into the thousands.

The nodes 210 are grouped together in clusters, referred to as serverclusters (or node clusters). For ease of explanation and to avoidcluttering the drawings, only a single cluster 212 is illustrated inFIG. 3. Each server cluster includes nodes 210 that correspond to aparticular customer of co-location facility 104. The nodes 210 of aserver cluster can be physically isolated from the nodes 210 of otherserver clusters. This physical isolation can take different forms, suchas separate locked cages or separate rooms at co-location facility 104.Physically isolating server clusters ensures customers of co-locationfacility 104 that only they can physically access their nodes (othercustomers cannot).

A landlord/tenant relationship (also referred to as a lessor/lesseerelationship) can also be established based on the nodes 210. The owner(and/or operator) of co-location facility 104 owns (or otherwise hasrights to) the individual nodes 210, and thus can be viewed as a“landlord”. The customers of co-location facility 104 lease the nodes210 from the landlord, and thus can be viewed as a “tenant”. Thelandlord is typically not concerned with what types of data or programsare being stored at the nodes 210 by the tenant, but does imposeboundaries on the clusters that prevent nodes 210 from differentclusters from communicating with one another, as discussed in moredetail below. Additionally, the nodes 210 provide assurances to thetenant that, although the nodes are only leased to the tenant, thelandlord cannot access confidential information stored by the tenant.

Although physically isolated, nodes 210 of different clusters are oftenphysically coupled to the same transport medium (or media) 211 thatenables access to network connection(s) 216, and possibly applicationoperations management console 242, discussed in more detail below. Thistransport medium can be wired or wireless.

As each node 210 can be coupled to a shared transport medium 211, eachnode 210 is configurable to restrict which other nodes 210 data can besent to or received from. Given that a number of different nodes 210 maybe included in a customer's (also referred to as tenant's) servercluster, the customer may want to be able to pass data between differentnodes 210 within the cluster for processing, storage, etc. However, thecustomer will typically not want data to be passed to other nodes 210that are not in the server cluster. Configuring each node 210 in thecluster to restrict which other nodes 210 data can be sent to orreceived from allows a boundary for the server cluster to be establishedand enforced. Establishment and enforcement of such server clusterboundaries prevents customer data from being erroneously or improperlyforwarded to a node that is not part of the cluster.

These initial boundaries established by the landlord preventcommunication between nodes 210 of different customers, thereby ensuringthat each customer's data can be passed to other nodes 210 of thatcustomer. The customer itself may also further define sub-boundarieswithin its cluster, establishing sub-clusters of nodes 210 that datacannot be communicated out of (or in to) either to or from other nodesin the cluster. The customer is able to add, modify, remove, etc. suchsub-cluster boundaries at will, but only within the boundaries definedby the landlord (that is, the cluster boundaries). Thus, the customer isnot able to alter boundaries in a manner that would allow communicationto or from a node 210 to extend to another node 210 that is not withinthe same cluster.

Co-location facility 104 supplies reliable power 214 and reliablenetwork connection(s) 216 (e.g., to network 108 of FIG. 1) to each ofthe nodes 210. Power 214 and network connection(s) 216 are shared by allof the nodes 210, although alternatively separate power 214 and networkconnection(s) 216 may be supplied to nodes 210 or groupings (e.g.,clusters) of nodes. Any of a wide variety of conventional mechanisms forsupplying reliable power can be used to supply reliable power 214, suchas power received from a public utility company along with backupgenerators in the event of power failures, redundant generators,batteries, fuel cells, or other power storage mechanisms, etc.Similarly, any of a wide variety of conventional mechanisms forsupplying a reliable network connection can be used to supply networkconnection(s) 216, such as redundant connection transport media,different types of connection media, different access points (e.g.,different Internet access points, different Internet service providers(ISPs), etc.).

In certain embodiments, nodes 210 are leased or sold to customers by theoperator or owner of co-location facility 104 along with the space(e.g., locked cages) and service (e.g., access to reliable power 214 andnetwork connection(s) 216) at facility 104. In other embodiments, spaceand service at facility 104 may be leased to customers while one or morenodes are supplied by the customer.

Management of each node 210 is carried out in a multiple-tiered manner.FIG. 4 is a block diagram illustrating an exemplary multi-tieredmanagement architecture. The multi-tiered architecture includes threetiers: a cluster operations management tier 230, an applicationoperations management tier 232, and an application development tier 234.Cluster operations management tier 230 is implemented locally at thesame location as the server(s) being managed (e.g., at a co-locationfacility) and involves managing the hardware operations of theserver(s). In the illustrated example, cluster operations managementtier 230 is not concerned with what software components are executing onthe nodes 210, but only with the continuing operation of the hardware ofnodes 210 and establishing any boundaries between clusters of nodes.

The application operations management tier 232, on the other hand, isimplemented at a remote location other than where the server(s) beingmanaged are located (e.g., other than the co-location facility), butfrom a client computer that is still communicatively coupled to theserver(s). The application operations management tier 232 involvesmanaging the software operations of the server(s) and defining anysub-boundaries within server clusters. The client can be coupled to theserver(s) in any of a variety of manners, such as via the Internet orvia a dedicated (e.g., dial-up) connection. The client can be coupledcontinually to the server(s), or alternatively sporadically (e.g., onlywhen needed for management purposes).

The application development tier 234 is implemented on another clientcomputer at a location other than the server(s) (e.g., other than at theco-location facility) and involves development of software components orengines for execution on the server(s). Alternatively, current softwareon a node 210 at co-location facility 104 could be accessed by a remoteclient to develop additional software components or engines for thenode. Although the client at which application development tier 234 isimplemented is typically a different client than that at whichapplication operations management tier 232 is implemented, tiers 232 and234 could be implemented (at least in part) on the same client.

Although only three tiers are illustrated in FIG. 4, alternatively themulti-tiered architecture could include different numbers of tiers. Forexample, the application operations management tier may be separatedinto two tiers, each having different (or overlapping) responsibilities,resulting in a 4-tiered architecture. The management at these tiers mayoccur from the same place (e.g., a single application operationsmanagement console may be shared), or alternatively from differentplaces (e.g., two different operations management consoles).

Returning to FIG. 3, co-location facility 104 includes a clusteroperations management console for each server cluster. In the example ofFIG. 3, cluster operations management console 240 corresponds to cluster212 and may be, for example, a management device 110 of FIG. 1. Clusteroperations management console 240 implements cluster operationsmanagement tier 230 (FIG. 4) for cluster 212 and is responsible formanaging the hardware operations of nodes 210 in cluster 212. Clusteroperations management console 240 monitors the hardware in cluster 212and attempts to identify hardware failures. Any of a wide variety ofhardware failures can be monitored for, such as processor failures, busfailures, memory failures, etc. Hardware operations can be monitored inany of a variety of manners, such as cluster operations managementconsole 240 sending test messages or control signals to the nodes 210that require the use of particular hardware in order to respond (noresponse or an incorrect response indicates failure), having messages orcontrol signals that require the use of particular hardware to generateperiodically sent by nodes 210 to cluster operations management console240 (not receiving such a message or control signal within a specifiedamount of time indicates failure), etc. Alternatively, clusteroperations management console 240 may make no attempt to identify whattype of hardware failure has occurred, but rather simply that a failurehas occurred.

Once a hardware failure is detected, cluster operations managementconsole 240 acts to correct the failure. The action taken by clusteroperations management console 240 can vary based on the hardware as wellas the type of failure, and can vary for different server clusters. Thecorrective action can be notification of an administrator (e.g., aflashing light, an audio alarm, an electronic mail message, calling acell phone or pager, etc.), or an attempt to physically correct theproblem (e.g., reboot the node, activate another backup node to take itsplace, etc.).

Cluster operations management console 240 also establishes clusterboundaries within co-location facility 104. The cluster boundariesestablished by console 240 prevent nodes 210 in one cluster (e.g.,cluster 212) from communicating with nodes in another cluster (e.g., anynode not in cluster 212), while at the same time not interfering withthe ability of nodes 210 within a cluster from communicating with othernodes within that cluster. These boundaries provide security for thetenants' data, allowing them to know that their data cannot becommunicated to other tenants' nodes 210 at facility 104 even thoughnetwork connection 216 may be shared by the tenants.

In the illustrated example, each cluster of co-location facility 104includes a dedicated cluster operations management console.Alternatively, a single cluster operations management console maycorrespond to, and manage hardware operations of, multiple serverclusters. According to another alternative, multiple cluster operationsmanagement consoles may correspond to, and manage hardware operationsof, a single server cluster. Such multiple consoles can manage a singleserver cluster in a shared manner, or one console may operate as abackup for another console (e.g., providing increased reliabilitythrough redundancy, to allow for maintenance, etc.).

An application operations management console 242 is also communicativelycoupled to co-location facility 104. Application operations managementconsole 242 may be, for example, a management device 110 of FIG. 1.Application operations management console 242 is located at a locationremote from co-location facility 104 (that is, not within co-locationfacility 104), typically being located at the offices of the customer. Adifferent application operations management console 242 corresponds toeach server cluster of co-location facility 104, although alternativelymultiple consoles 242 may correspond to a single server cluster, or asingle console 242 may correspond to multiple server clusters.Application operations management console 240 implements applicationoperations management tier 232 (FIG. 4) for cluster 212 and isresponsible for managing the software operations of nodes 210 in cluster212 as well as securing sub-boundaries within cluster 212.

Application operations management console 242 monitors the software incluster 212 and attempts to identify software failures. Any of a widevariety of software failures can be monitored for, such as applicationprocesses or threads that are “hung” or otherwise non-responsive, anerror in execution of application processes or threads, etc. Softwareoperations can be monitored in any of a variety of manners (similar tothe monitoring of hardware operations discussed above), such asapplication operations management console 242 sending test messages orcontrol signals to particular processes or threads executing on thenodes 210 that require the use of particular routines in order torespond (no response or an incorrect response indicates failure), havingmessages or control signals that require the use of particular softwareroutines to generate periodically sent by processes or threads executingon nodes 210 to application operations management console 242 (notreceiving such a message or control signal within a specified amount oftime indicates failure), etc. Alternatively, application operationsmanagement console 242 may make no attempt to identify what type ofsoftware failure has occurred, but rather simply that a failure hasoccurred.

Once a software failure is detected, application operations managementconsole 242 acts to correct the failure. The action taken by applicationoperations management console 242 can vary based on the hardware as wellas the type of failure, and can vary for different server clusters. Thecorrective action can be notification of an administrator (e.g., aflashing light, an audio alarm, an electronic mail message, calling acell phone or pager, etc.), or an attempt to correct the problem (e.g.,reboot the node, re-load the software component or engine image,terminate and re-execute the process, etc.).

Thus, the management of a node 210 is distributed across multiplemanagers, regardless of the number of other nodes (if any) situated atthe same location as the node 210. The multi-tiered management allowsthe hardware operations management to be separated from the applicationoperations management, allowing two different consoles (each under thecontrol of a different entity) to share the management responsibilityfor the node.

FIG. 5 is a block diagram illustrating an exemplary remotely managednode in more detail in accordance with certain embodiments of theinvention. Node 248 can be a node 210 of a co-location facility, oralternatively a separate device (e.g., a client 102 or server 112 ofFIG. 1). Node 248 includes a monitor 250, referred to as the “BMonitor”,and a plurality of software components or engines 252, and is coupled to(or alternatively incorporates) a mass storage device 262. In theillustrated example, node 248 is a computing device having aprocessor(s) that supports multiple privilege levels (e.g., rings in an×86 architecture processor). In the illustrated example, these privilegelevels are referred to as rings, although alternate implementationsusing different processor architectures may use different nomenclature.The multiple rings provide a set of prioritized levels that software canexecute at, often including 4 levels (Rings 0, 1, 2, and 3). Ring 0 istypically referred to as the most privileged ring. Software processesexecuting in Ring 0 can typically access more features (e.g.,instructions) than processes executing in less privileged Rings.Furthermore, a processor executing in a particular Ring cannot altercode or data in a higher priority ring. In the illustrated example,BMonitor 250 executes in Ring 0, while engines 252 execute in Ring 1 (oralternatively Rings 2 and/or 3). Thus, the code or data of BMonitor 250(executing in Ring 0) cannot be altered directly by engines 252(executing in Ring 1). Rather, any such alterations would have to bemade by an engine 252 requesting BMonitor 250 to make the alteration(e.g., by sending a message to BMonitor 250, invoking a function ofBMonitor 250, etc.). Implementing BMonitor 250 in Ring 0 protectsBMonitor 250 from a rogue or malicious engine 252 that tries to bypassany restrictions imposed by BMonitor 250.

Alternatively, BMonitor 250 may be implemented in other manners thatprotect it from a rogue or malicious engine 252. For example, node 248may include multiple processors—one (or more) processor(s) for executingengines 252, and another processor(s) to execute BMonitor 250. Byallowing only BMonitor 250 to execute on a processor(s) separate fromthe processor(s) on which engines 252 are executing, BMonitor 250 can beeffectively shielded from engines 252.

BMonitor 250 is the fundamental control module of node 248—it controls(and optionally includes) both the network interface card and the memorymanager. By controlling the network interface card (which may beseparate from BMonitor 250, or alternatively BMonitor 250 may beincorporated on the network interface card), BMonitor 250 can controldata received by and sent by node 248. By controlling the memorymanager, BMonitor 250 controls the allocation of memory to engines 252executing in node 248 and thus can assist in preventing rogue ormalicious engines from interfering with the operation of BMonitor 250.

Although various aspects of node 248 may be under control of BMonitor250 (e.g., the network interface card), BMonitor 250 still makes atleast part of such functionality available to engines 252 executing onthe node 248. BMonitor 250 provides an interface (e.g., via controller254 discussed in more detail below) via which engines 252 can requestaccess to the functionality, such as to send data out to another node248 within a co-location facility or on the Internet. These requests cantake any of a variety of forms, such as sending messages, calling afunction, etc.

BMonitor 250 includes controller 254, network interface 256, one or morefilters 258, one or more keys 259, and a BMonitor Control Protocol(BMCP) module 260. Network interface 256 provides the interface betweennode 248 and the network (e.g., network 108 of FIG. 1). Filters 258identify other nodes 248 in a co-location facility (and/or other sourcesor targets (e.g., coupled to Internet 108 of FIG. 1) that data can (oralternatively cannot) be sent to and/or received from. The nodes orother sources/targets can be identified in any of a wide variety ofmanners, such as by network address (e.g., Internet Protocol (IP)address), some other globally unique identifier, a locally uniqueidentifier (e.g., a numbering scheme proprietary or local to co-locationfacility 104), etc.

Filters 258 can fully restrict access to a node (e.g., no data can bereceived from or sent to the node), or partially restrict access to anode. Partial access restriction can take different forms. For example,a node may be restricted so that data can be received from the node butnot sent to the node (or vice versa). By way of another example, a nodemay be restricted so that only certain types of data (e.g.,communications in accordance with certain protocols, such as HTTP) canbe received from and/or sent to the node. Filtering based on particulartypes of data can be implemented in different manners, such as bycommunicating data in packets with header information that indicate thetype of data included in the packet.

Filters 258 can be added by one or more management devices 110 of FIG. 1or either of application operations management console 242 or clusteroperations management console 240 of FIG. 3. In the illustrated example,filters added by cluster operations management console 240 (to establishcluster boundaries) restrict full access to nodes (e.g., any access toanother node can be prevented) whereas filters added by applicationoperations management console 242 (to establish sub-boundaries within acluster) or management device 110 can restrict either full access tonodes or partial access.

Controller 254 also imposes some restrictions on what filters can beadded to filters 258. In the multi-tiered management architectureillustrated in FIGS. 3 and 4, controller 254 allows cluster operationsmanagement console 240 to add any filters it desires (which will definethe boundaries of the cluster). However, controller 254 restrictsapplication operations management console 242 to adding only filtersthat are at least as restrictive as those added by console 240. Ifconsole 242 attempts to add a filter that is less restrictive than thoseadded by console 240 (in which case the sub-boundary may extend beyondthe cluster boundaries), controller 254 refuses to add the filter (oralternatively may modify the filter so that it is not less restrictive).By imposing such a restriction, controller 254 can ensure that thesub-boundaries established at the application operations managementlevel do not extend beyond the cluster boundaries established at thecluster operations management level.

Controller 254, using one or more filters 258, operates to restrict datapackets sent from node 248 and/or received by node 248. All dataintended for an engine 252, or sent by an engine 252, to another node,is passed through network interface 256 and filters 258. Controller 254applies the filters 258 to the data, comparing the target of the data(e.g., typically identified in a header portion of a packet includingthe data) to acceptable (and/or restricted) nodes (and/or networkaddresses) identified in filters 258. If filters 258 indicate that thetarget of the data is acceptable, then controller 254 allows the data topass through to the target (either into node 248 or out from node 248).However, if filters 258 indicate that the target of the data is notacceptable, then controller 254 prevents the data from passing throughto the target. Controller 254 may return an indication to the source ofthe data that the data cannot be passed to the target, or may simplyignore or discard the data.

The application of filters 258 to the data by controller 254 allows theboundary restrictions of a server cluster (FIG. 3) to be imposed.Filters 258 can be programmed (e.g., by application operationsmanagement console 242 of FIG. 3) with the node addresses of all thenodes within the server cluster (e.g., cluster 212). Controller 254 thenprevents data received from any node not within the server cluster frombeing passed through to an engine 252, and similarly prevents any databeing sent to a node other than one within the server cluster from beingsent. Similarly, data received from Internet 108 (FIG. 1) can identify atarget node 248 (e.g., by IP address), so that controller 254 of anynode other than the target node will prevent the data from being passedthrough to an engine 252. Furthermore, as filters 258 can be readilymodified by cluster operations management console 240, server clusterboundaries can be easily changed to accommodate changes in the servercluster (e.g., addition of nodes to and/or removal of nodes from theserver cluster).

BMCP module 260 implements the Distributed Host Control Protocol (DHCP),allowing BMonitor 250 (and thus node 248) to obtain an IP address from aDHCP server (e.g., cluster operations management console 240 of FIG. 3).During an initialization process for node 248, BMCP module 260 requestsan IP address from the DHCP server, which in turn provides the IPaddress to module 260. Additional information regarding DHCP isavailable from Microsoft Corporation of Redmond, Wash.

Software engines 252 include any of a wide variety of conventionalsoftware components. Examples of engines 252 include an operating system(e.g., Windows NT®), a load balancing server component (e.g., to balancethe processing load of multiple nodes 248), a caching server component(e.g., to cache data and/or instructions from another node 248 orreceived via the Internet), a storage manager component (e.g., to managestorage of data from another node 248 or received via the Internet),etc. In one implementation, each of the engines 252 is a protocol-basedengine, communicating with BMonitor 250 and other engines 252 viamessages and/or function calls without requiring the engines 252 andBMonitor 250 to be written using the same programming language.

Controller 254, in conjunction with loader 264, is responsible forcontrolling the execution of engines 252. This control can takedifferent forms, including beginning or initiating execution of anengine 252, terminating execution of an engine 252, re-loading an imageof an engine 252 from a storage device, debugging execution of an engine252, etc. Controller 254 receives instructions from applicationoperations management console 242 of FIG. 3 or a management device(s)110 of FIG. 1 regarding which of these control actions to take and whento take them. In the event that execution of an engine 252 is to beinitiated (including re-starting an engine whose execution was recentlyterminated), controller 254 communicates with loader 264 to load animage of the engine 252 from a storage device (e.g., device 262, ROM,etc.) into the memory (e.g., RAM) of node 248. Loader 264 operates in aconventional manner to copy the image of the engine from the storagedevice into memory and initialize any necessary operating systemparameters to allow execution of the engine 252. Thus, the control ofengines 252 is actually managed by a remote device, not locally at thesame location as the node 248 being managed.

Controller 254 also provides an interface via which applicationoperations management console 242 of FIG. 3 or a management device(s)110 of FIG. 1 can identify filters to add (and/or remove) from filterset 258.

Controller 254 also includes an interface via which cluster operationsmanagement console 240 of FIG. 3 can communicate commands to controller254. Different types of hardware operation oriented commands can becommunicated to controller 254 by cluster operations management console240, such as re-booting the node, shutting down the node, placing thenode in a low-power state (e.g., in a suspend or standby state),changing cluster boundaries, changing encryption keys (if any), etc.

Controller 254 further optionally provides encryption support forBMonitor 250, allowing data to be stored securely on mass storage device262 (e.g., a magnetic disk, an optical disk, etc.) and securecommunications to occur between node 248 and an operations managementconsole (e.g., console 240 or 242 of FIG. 3) or other management device(e.g., management device 110 of FIG. 1).

Controller 254 maintains multiple encryption keys 259, which can includea variety of different keys such as symmetric keys (secret keys used insecret key cryptography), public/private key pairs (for public keycryptography), etc. to be used in encrypting and/or decrypting data.

BMonitor 250 makes use of public key cryptography to provide securecommunications between node 248 and the management consoles (e.g.,consoles 240 or 242 of FIG. 3) or other management devices (e.g.,management device(s) 110 of FIG. 1). Public key cryptography is based ona key pair, including both a public key and a private key, and anencryption algorithm. The encryption algorithm can encrypt data based onthe public key such that it cannot be decrypted efficiently without theprivate key. Thus, communications from the public-key holder can beencrypted using the public key, allowing only the private-key holder todecrypt the communications. Any of a variety of public key cryptographytechniques may be used, such as the well-known RSA (Rivest, Shamir, andAdelman) encryption technique. For a basic introduction of cryptography,the reader is directed to a text written by Bruce Schneier and entitled“Applied Cryptography: Protocols, Algorithms, and Source Code in C,”published by John Wiley & Sons with copyright 1994 (or second editionwith copyright 1996).

BMonitor 250 is initialized to include a public/private key pair forboth the landlord and the tenant. These key pairs can be generated byBMonitor 250, or alternatively by some other component and stored withinBMonitor 250 (with that other component being trusted to destroy itsknowledge of the key pair). As used herein, U refers to a public key andR refers to a private key. The public/private key pair for the landlordis referred to as (U_(L), R_(L)), and the public/private key pair forthe tenant is referred to as (U_(T), R_(T)). BMonitor 250 makes thepublic keys U_(L) and U_(T) available to the landlord, but keeps theprivate keys R_(L) and R_(T) secret. In the illustrated example,BMonitor 250 never divulges the private keys R_(L) and R_(T), so boththe landlord and the tenant can be assured that no entity other than theBMonitor 250 can decrypt information that they encrypt using theirpublic keys (e.g., via cluster operations management console 240 andapplication operations management console 242 of FIG. 3, respectively).

Once the landlord has the public keys U_(L) and U_(T), the landlord canassign node 248 to a particular tenant, giving that tenant the publickey U_(T). Use of the public key U_(T) allows the tenant to encryptcommunications to BMonitor 250 that only BMonitor 250 can decrypt (usingthe private key R_(T)). Although not required, a prudent initial stepfor the tenant is to request that BMonitor 250 generate a newpublic/private key pair (U_(T), R_(T)). In response to such a request,controller 254 or a dedicated key generator (not shown) of BMonitor 250generates a new public/private key pair in any of a variety ofwell-known manners, stores the new key pair as the tenant key pair, andreturns the new public key U_(T) to the tenant. By generating a new keypair, the tenant is assured that no other entity, including thelandlord, is aware of the tenant public key U_(T). Additionally, thetenant may also have new key pairs generated at subsequent times.

Having a public/private key pair in which BMonitor 250 stores theprivate key and the tenant knows the public key allows information to besecurely communicated from the tenant to BMonitor 250. In order toensure that information can be securely communicated from BMonitor 250to the tenant, an additional public/private key pair is generated by thetenant and the public key portion is communicated to BMonitor 250. Anycommunications from BMonitor 250 to the tenant can thus be encryptedusing this public key portion, and can be decrypted only by the holderof the corresponding private key (that is, only by the tenant).

BMonitor 250 also maintains, as one of keys 259, a disk key which isgenerated based on one or more symmetric keys (symmetric keys refer tosecret keys used in secret key cryptography). The disk key, also asymmetric key, is used by BMonitor 250 to store information in massstorage device 262. BMonitor 250 keeps the disk key secure, using itonly to encrypt data node stored on mass storage device 262 and decryptdata node retrieved from mass storage device 262 (thus there is no needfor any other entities, including any management device, to haveknowledge of the disk key).

Use of the disk key ensures that data stored on mass storage device 262can only be decrypted by the node that encrypted it, and not any othernode or device. Thus, for example, if mass storage device 262 were to beremoved and attempts made to read the data on device 262, such attemptswould be unsuccessful. BMonitor 250 uses the disk key to encrypt data tobe stored on mass storage device 262 regardless of the source of thedata. For example, the data may come from a client device (e.g., client102 of FIG. 1) used by a customer of the tenant, from a managementdevice (e.g., a device 110 of FIG. 1 or a console 240 or 242 of FIG. 3),etc.

In one implementation, the disk key is generated by combining thestorage keys corresponding to each management device. The storage keyscan be combined in a variety of different manners, and in oneimplementation are combined by using one of the keys to encrypt theother key, with the resultant value being encrypted by another one ofthe keys, etc.

Additionally, BMonitor 250 operates as a trusted third party mediatinginteraction among multiple mutually distrustful management agents thatshare responsibility for managing node 248. For example, the landlordand tenant for node 248 do not typically filly trust one another.BMonitor 250 thus operates as a trusted third party, allowing the lessorand lessee of node 248 to trust that information made available toBMonitor 250 by a particular entity or agent is accessible only to thatentity or agent, and no other (e.g., confidential information given bythe lessor is not accessible to the lessee, and vice versa). BMonitor250 uses a set of layered ownership domains (ODs) to assist in creatingthis trust. An ownership domain is the basic unit of authentication andrights in BMonitor 250, and each managing entity or agent (e.g., thelessor and the lessee) corresponds to a separate ownership domain(although each managing entity may have multiple management devices fromwhich it can exercise its managerial responsibilities).

FIG. 6 is a block diagram illustrating an exemplary set of ownershipdomains in accordance with certain embodiments of the invention.Multiple (×) ownership domains 280, 282, and 284 are organized as anownership domain stack 286. Each ownership domain 280-284 corresponds toa particular managerial level and one or more management devices (e.g.,device(s) 110 of FIG. 1, consoles 240 and 242 of FIG. 3, etc.). The baseor root ownership domain 280 corresponds to the actual owner of thenode, such as the landlord discussed above. The next lowest ownershipdomain 282 corresponds to the entity that the owner of the hardwareleases the hardware to (e.g., the tenant discussed above). A managementdevice in a particular ownership domain can set up another ownershipdomain for another management device that is higher on ownership domainstack 286. For example, the entity that the node is leased to can set upanother ownership domain for another entity (e.g., to set up a clusterof nodes implementing a database cluster).

When a new ownership domain is created, it is pushed on top of ownershipdomain stack 286. It remains the top-level ownership domain until eitherit creates another new ownership domain or its rights are revoked. Anownership domain's rights can be revoked by a device in any lower-levelownership domain on ownership domain stack 286, at which point theownership domain is popped from (removed from) stack 286 along with anyother higher-level ownership domains. For example, if the owner of node248 (ownership domain 280) were to revoke the rights of ownership domain282, then ownership domains 282 and 284 would be popped from ownershipdomain stack 286.

Each ownership domain has a corresponding set of rights. In theillustrated example, the top-level ownership domain has one set ofrights that include: (1) the right to push new ownership domains on theownership domain stack; (2) the right to access any system memory in thenode; (3) the right to access any mass storage devices in or coupled tothe node; (4) the right to modify (add, remove, or change) packetfilters at the node; (5) the right to start execution of softwareengines on the node (e.g., engines 252 of FIG. 5); (6) the right to stopexecution of software engines on the node, including resetting the node;(7) the right to debug software engines on the node; (8) the right tochange its own authentication credentials (e.g., its public key or ID);(9) the right to modify its own storage key; (10) the right to subscribeto events engine events, machine events, and/or packet filter events(e.g., notify a management console or other device when one of theseevents occurs). Additionally, each of the lower-level ownership domainshas another set of rights that include: (1) the right to pop an existingownership domain(s); (2) the right to modify (add, remove, or change)packet filters at the node; (3) the right to change its ownauthentication credentials (e.g., public key or ID); and (4) the rightto subscribe to machine events and/or packet filter events.Alternatively, some of these rights may not be included (e.g., dependingon the situation, the right to debug software engines on the node maynot be needed), or other rights may be included (e.g., the top-levelnode may include the right to pop itself off the ownership domainstack).

Ownership domains can be added to and removed from ownership domainstack 286 numerous times during operation. Which ownership domains areremoved and/or added varies based on the activities being performed. Byway of example, if the owner of node 248 (corresponding to rootownership domain 280) desires to perform some operation on node 248, allhigher-level ownership domains 282-284 are revoked, the desiredoperation is performed (ownership domain 280 is now the top-leveldomain, so the expanded set of rights are available), and then newownership domains can be created and added to ownership domain stack 286(e.g., so that the management agent previously corresponding to thetop-level ownership domain is returned to its previous position).

BMonitor 250 checks, for each request received from an entitycorresponding to one of the ownership domains (e.g., a managementconsole controlled by the entity), what rights the ownership domain has.If the ownership domain has the requisite rights for the request to beimplemented, then BMonitor 250 carries out the request. However, if theownership domain does not have the requisite set of rights, then therequest is not carried out (e.g., an indication that the request cannotbe carried out can be returned to the requester, or alternatively therequest can simply be ignored).

In the illustrated example, each ownership domain includes an identifier(ID), a public key, and a storage key. The identifier serves as a uniqueidentifier of the ownership domain, the public key is used to sendsecure communications to a management device corresponding to theownership domain, and the storage key is used (at least in part) toencrypt information stored on mass storage devices. An additionalprivate key may also be included for each ownership domain for themanagement device corresponding to the ownership domain to send securecommunications to the BMonitor. When the root ownership domain 280 iscreated, it is initialized (e.g., by BMonitor 250) with its ID andpublic key. The root ownership domain 280 may also be initialized toinclude the storage key (and a private key), or alternatively it may beadded later (e.g., generated by BMonitor 250, communicated to BMonitor250 from a management console, etc.). Similarly, each time a newownership domain is created, the ownership domain that creates the newownership domain communicates an ID and public key to BMonitor 250 forthe new ownership domain. A storage key (and a private key) may also becreated for the new ownership domain when the new ownership domain iscreated, or alternatively at a later time.

BMonitor 250 authenticates a management device(s) corresponding to eachof the ownership domains. BMonitor does not accept any commands from amanagement device until it is authenticated, and only revealsconfidential information (e.g., encryption keys) for a particularownership domain to a management device(s) that can authenticate itselfas corresponding to that ownership domain. This authentication processcan occur multiple times during operation of the node, allowing themanagement devices for one or more ownership domains to change overtime. The authentication of management devices can occur in a variety ofdifferent manners. In one implementation, when a management devicerequests a connection to BMonitor 250 and asserts that it corresponds toa particular ownership domain, BMonitor 250 generates a token (e.g., arandom number), encrypts the token with the public key of the ownershipdomain, and then sends the encrypted token to the requesting managementdevice. Upon receipt of the encrypted token, the management devicedecrypts the token using its private key, and then returns the decryptedtoken to BMonitor 250. If the returned token matches the token thatBMonitor 250 generated, then the authenticity of the management deviceis verified (because only the management device with the correspondingprivate key would be able to decrypt the token). An analogous processcan be used for BMonitor 250 to authenticate itself to the managementdevice.

Once authenticated, the management device can communicate requests toBMonitor 250 and have any of those requests carried out (assuming it hasthe rights to do so). Although not required, it is typically prudent fora management console, upon initially authenticating itself to BMonitor250, to change its public key/private key pair.

When a new ownership domain is created, the management device that iscreating the new ownership domain can optionally terminate any executingengines 252 and erase any system memory and mass storage devices. Thisprovides an added level of security, on top of the encryption, to ensurethat one management device does not have access to information stored onthe hardware by another management device. Additionally, each time anownership domain is popped from the stack, BMonitor 250 terminated anyexecuting engines 252, erases the system memory, and also erases thestorage key for that ownership domain. Thus, any information stored bythat ownership domain cannot be accessed by the remaining ownershipdomains—the memory has been erased so there is no data in memory, andwithout the storage key information on the mass storage device cannot bedecrypted. BMonitor 250 may alternatively erase the mass storage devicetoo. However, by simply erasing the key and leaving the data encrypted,BMonitor 250 allows the data to be recovered if the popped ownershipdomain is re-created (and uses the same storage key).

FIG. 7 is a flow diagram illustrating the general operation of BMonitor250 in accordance with certain embodiments of the invention. Initially,BMonitor 250 monitors the inputs it receives (block 290). These inputscan be from a variety of different sources, such as another node 248, aclient computer via network connection 216 (FIG. 3), client operationsmanagement console 240, application operations management console 242,an engine 252, a management device 110 (FIG. 1), etc.

If the received request is a control request (e.g., from one of consoles240 or 242 of FIG. 1, or a management device(s) 110 of FIG. 1), then acheck is made (based on the top-level ownership domain) as to whetherthe requesting device has the necessary rights for the request (block292). If the requesting device does not have the necessary rights, thenBMonitor 250 returns to monitoring inputs (block 290) withoutimplementing the request. However, if the requesting device has thenecessary rights, then the request is implemented (block 294), andBMonitor 250 continues to monitor the inputs it receives (block 290).However, if the received request is a data request (e.g., inbound fromanother node 248 or a client computer via network connection 216,outbound from an engine 252, etc.), then BMonitor 250 either accepts orrejects the request (act 296), and continues to monitor the inputs itreceives (block 290). Whether BMonitor 250 accepts a request isdependent on the filters 258 (FIG. 5), as discussed above.

FIG. 8 is a flowchart illustrating an exemplary process for handlingoutbound data requests in accordance with certain embodiments of theinvention. The process of FIG. 8 is implemented by BMonitor 250 of FIG.5, and may be performed in software. The process of FIG. 8 is discussedwith additional reference to components in FIGS. 1, 3 and 5.

Initially, the outbound data request is received (act 300). Controller254 compares the request to outbound request restrictions (act 302).This comparison is accomplished by accessing information correspondingto the data (e.g., information in a header of a packet that includes thedata or information inherent in the data, such as the manner (e.g.,which of multiple function calls is used) in which the data request wasprovided to BMonitor 250) to the outbound request restrictionsmaintained by filters 258. This comparison allows BMonitor 250 todetermine whether it is permissible to pass the outbound data request tothe target (act 304). For example, if filters 258 indicate which targetsdata cannot be sent to, then it is permissible to pass the outbound datarequest to the target only if the target identifier is not identified infilters 258.

If it is permissible to pass the outbound request to the target, thenBMonitor 250 sends the request to the target (act 306). For example,BMonitor 250 can transmit the request to the appropriate target viatransport medium 211 (and possibly network connection 216), or viaanother connection to network 108. However, if it is not permissible topass the outbound request to the target, then BMonitor 250 rejects therequest (act 308). BMonitor 250 may optionally transmit an indication tothe source of the request that it was rejected, or alternatively maysimply drop the request.

FIG. 9 is a flowchart illustrating an exemplary process for handlinginbound data requests in accordance with certain embodiments of theinvention. The process of FIG. 9 is implemented by BMonitor 250 of FIG.5, and may be performed in software. The process of FIG. 9 is discussedwith additional reference to components in FIG. 5.

Initially, the inbound data request is received (act 310). Controller254 compares the request to inbound request restrictions (act 312). Thiscomparison is accomplished by accessing information corresponding to thedata to the inbound request restrictions maintained by filters 258. Thiscomparison allows BMonitor 250 to determine whether it is permissiblefor any of software engines 252 to receive the data request (act 314).For example, if filters 258 indicate which sources data can be receivedfrom, then it is permissible for an engine 252 to receive the datarequest only if the source of the data is identified in filters 258.

If it is permissible to receive the inbound data request, then BMonitor250 forwards the request to the targeted engine(s) 252 (act 316).However, if it is not permissible to receive the inbound data requestfrom the source, then BMonitor 250 rejects the request (act 318).BMonitor 250 may optionally transmit an indication to the source of therequest that it was rejected, or alternatively may simply drop therequest.

Conclusion

Although the description above uses language that is specific tostructural features and/or methodological acts, it is to be understoodthat the invention defined in the appended claims is not limited to thespecific features or acts described. Rather, the specific features andacts are disclosed as exemplary forms of implementing the invention.

1. One or more computer-readable media having stored thereon a computerprogram that, when executed by one or more processors of a node in aco-location facility, causes the one or more processors to perform actsincluding: beginning and terminating execution of components on the nodein response to received commands; and restricting which other nodes inthe co-location facility components that are executing on the node canreceive data from and send data to.
 2. One or more computer-readablemedia as recited in claim 1, wherein a plurality of management devicesshare management responsibility for the node, and wherein beginning andterminating execution of components on the node is restricted to onlyone of the plurality of management devices at a time.
 3. One or morecomputer-readable media as recited in claim 1, wherein the restrictingcomprises: checking whether it is permissible to forward received datato its intended target; and forwarding the received data to its intendedtarget only if it is permissible to do so.
 4. One or morecomputer-readable media as recited in claim 3, wherein the intendedtarget comprises another node in the co-location facility.
 5. One ormore computer-readable media as recited in claim 3, wherein the intendedtarget comprises at least one of the components executing on the node.6. One or more computer-readable media as recited in claim 1, whereinthe beginning and terminating execution of components comprisesbeginning and termination execution of the components based on commandsreceived from an operations console at a location remote from theco-location facility.
 7. One or more computer-readable media as recitedin claim 1, wherein one of the components comprises an operating system.8. A method comprising: receiving, at a node in a co-location facility,a first request from a first control console that is local to theco-location facility; implementing the first request; receiving, at thenode, a second request from a second control console that is remote fromthe co-location facility; and implementing the second request.
 9. Amethod as recited in claim 8, wherein the first request compriseshardware operation oriented commands.
 10. A method as recited in claim8, wherein the second request comprises software application controloriented commands.
 11. A method as recited in claim 8, wherein the firstrequest corresponds to one of a first set of rights that are granted tothe first control console, wherein the second request corresponds to oneof a second set of rights that are granted to the second controlconsole, and wherein the first set of rights is more restricted than thesecond set of rights.
 12. One or more computer-readable memoriescontaining a computer program that is executable by a processor toperform the method recited in claim
 8. 13. One or more computer-readablemedia having stored thereon a computer program that, when executed byone or more processors of a node in a facility, causes the one or moreprocessors to perform acts including: establishing a boundary of aserver cluster in the facility, wherein the server cluster includes thenode; and altering the boundary of the server cluster based on commandsreceived from a console outside the server cluster.
 14. One or morecomputer-readable media as recited in claim 13, wherein the establishingcomprises including a filter that restricts access to another node thatis in the facility but that is not in the server cluster.
 15. One ormore computer-readable media as recited in claim 13, wherein theestablishing comprises generating a plurality of filters identifyingonly other nodes in the server cluster as being permissible to access.16. One or more computer-readable media as recited in claim 13, whereinthe computer program, when executed, further causes the one or moreprocessors to perform acts including executing a software engine inresponse to a command received from the console.
 17. One or morecomputer-readable, media as recited in claim 13, wherein the computerprogram, when executed, further causes the one or more processors toperform acts including terminating execution of a software engine inresponse to a command received from the console.
 18. One or morecomputer-readable media as recited in claim 13, wherein the facilitycomprises a co-location facility.